Low-Cost Supervision for Multiple-Source Attribute Extraction
نویسندگان
چکیده
Previous studies on extracting class attributes from unstructured text consider either Web documents or query logs as the source of textual data. Web search queries have been shown to yield attributes of higher quality. However, since many relevant attributes found in Web documents occur infrequently in query logs, Web documents remain an important source for extraction. In this paper, we introduce Bootstrapped Web Search (BWS) extraction, the first approach to extracting class attributes simultaneously from both sources. Extraction is guided by a small set of seed attributes and does not rely on further domainspecific knowledge. BWS is shown to improve extraction precision and also to improve attribute relevance across 40 test classes.
منابع مشابه
AMBER: Automatic Supervision for Multi-Attribute Extraction
The extraction of multi-attribute objects from the deep web is the bridge between the unstructured web and structured data. Existing approaches either induce wrappers from a set of human-annotated pages or leverage repeated structures on the page without supervision. What the former lack in automation, the latter lack in accuracy. Thus accurate, automatic multi-attribute object extraction has r...
متن کاملTwo Stage Multiple Attribute Decision Making Problem in Iranian Gas Distribution Systems
The purpose of this paper is to present the possibility of replacing physical unit cost in transportation or distribution problems by an aggregate coefficient, getting qualitative and subjective considerations involved. The model for constructing aggregate cost is a two stage multiple attribute decision-making problems. In the first stage supply points, demand points and routes of transportatio...
متن کاملWeakly Supervised User Profile Extraction from Twitter
While user attribute extraction on social media has received considerable attention, existing approaches, mostly supervised, encounter great difficulty in obtaining gold standard data and are therefore limited to predicting unary predicates (e.g., gender). In this paper, we present a weaklysupervised approach to user profile extraction from Twitter. Users’ profiles from social media websites su...
متن کاملRelation Extraction Using TBL with Distant Supervision
Supervised machine learning methods have been widely used in relation extraction that finds the relation between two named entities in a sentence. However, their disadvantages are that constructing training data is a cost and time consuming job, and the machine learning system is dependent on the domain of the training data. To overcome these disadvantages, we construct a weakly labeled data se...
متن کاملInformation Extraction in Illicit Web Domains
Extracting useful entities and attribute values from illicit domains such as human trafficking is a challenging problem with the potential for widespread social impact. Such domains employ atypical language models, have ‘long tails’ and suffer from the problem of concept drift. In this paper, we propose a lightweight, feature-agnostic Information Extraction (IE) paradigm specifically designed f...
متن کامل